Pseudotime analysis

1. Introduction

This vignette gives a quick overview on pseudotime analysis and how to apply it on a vgm object. In pseudotime analysis single cells are ordered along a trajectory or lineage and each cell receives a specific pseudotime value which represents its position along that trajectory. This can be used for instance in analysing cell differentiation processes or observing changes in gene expression along lineage specification.

We mainly focus on the usage of two pseudotime libraries: Monocle (version 2 and 3) and slingshot. Monocle3 learns from the fact that cells are differentially expressed and sliced along states of differentiation. These transcriptional changes can be learned from scRNA-seq and used to projects trajectories on dimensionality reduction (on e.g. UMAP plot). After that a root for pseudotime can be chosen and it assigns to each cell a pseudotime value based on the trajectories and root node.

The previous version of Monocle3, Monocle2, uses the DDRTree algorithm to learn a tree structure based on the data and projects clustered cells along the branches of the tree. These trajectories can be used to assign a root branch and pseudotime will be assigned along the branches starting from the root branch.

The slingshot library is similar to Monocle3 in a way that it uses dimensionality reduction to project trajectories. It learns relations between clustered data and infers lineages on a global structure. A main difference to monocle is that slingshot can infer multiple lineages based on one root and each lineage assigns pseudotime values to every single cell for that particular lineage.

2. Loading the data

#Downloading PlatypusDB raw data in a list format
#For structure of PlatypusDB links, please refer to the PlatypusDB vignette 
data_for_VGM <-PlatypusDB_fetch(PlatypusDB.links =c(
  "shlesinger2022b/TFH.LCMV.acuteLCMV.10dpi.S1/ALL"
  #"shlesinger2022b/TFH.LCMV.chronicLCMV.50dpi.S7/ALL"
  #"shlesinger2022b/TFH.LCMV.chronicLCMV.30dpi.S6/ALL"
  #"shlesinger2022b/TFH.LCMV.chronicLCMV.10dpi.S4/ALL"
  ),
  load.to.enviroment =  F,
  load.to.list = T,
  combine.objects = T)

#integrate VDJ and GEX of the two datasets into a vgm object
VGM <- VDJ_GEX_matrix(Data.in = data_for_VGM, verbose = T,
                      GEX.integrate = T,
                      VDJ.combine = T,
                      integrate.GEX.to.VDJ = T,
                      integrate.VDJ.to.GEX = T,
                      exclude.GEX.not.in.VDJ = F,
                      filter.overlapping.barcodes.GEX = F,
                      filter.overlapping.barcodes.VDJ = F,
                      get.VDJ.stats = F,
                      subsample.barcodes = F,
                      trim.and.align = F,
                      #integration.method = "harmony",
                      group.id = names(data_for_VGM))

After the data is loaded and renamed according to their data type (renaming not shown, only for reasons of clarity) let’s have a look at the UMAP:

Seurat::DimPlot(VGM[[2]], reduction = 'umap' ,group.by = "group_id")

3. Gene expression and ProjecTILs plot

To have a better understanding about the distribution of various cell types in the data set we can visualize the expression of cell markers and their cell type annotations.

3.1 Gene expression plots

First we define a set of genes and the function GEX_gene_visualisation() visualizes the expression of the gene set in our data in three different plots.

gene_visual_output <- GEX_gene_visualization(VGM$GEX,
                       gene_set = c('CD3E', 'CD4', 'CD8A', 'CD44'),
                       group.by = "group_id")

gene_visual_output[[1]]
gene_visual_output[[2]]
gene_visual_output[[3]]

3.2 ProjecTILs plots

As a next step we are interested in the cell types which are present in our data. For that, we use the GEX_projecTILS() function. ProjecTILs compares our data with a reference map of different cell types. By doing so it lays projections onto the reference map indicating which cell types are mostly present in our data.

projecTILs_output <- GEX_projecTILS(ref_path = refpath, 
               GEX = VGM$GEX, 
               split_by = "group_id",
               filtering =  TRUE, 
               NA_cells = FALSE)


#update VGM$GEX with the new updated GEX object
VGM[[2]] <- projecTILs_output[[1]]

projecTILs_output[[3]]

## [[1]]

Here we can see that ProjecTILs projects the data onto the reference map. The black circles and triangles represent the cells which are present in the VGM. We can see that a majority of our cells are either Tfh_Effector or Tfh_Memory cells. projecTILs_output[[1]] is the updated GEX object with additional meta data (“ProjecTILS_assignment”) containing the ProjecTILs assignments which we will be using for the next steps.

4. Trajectories and pseudotime plot using Monocle3

As a next step we use Monocle3 to reconstruct trajectories. You can choose the reduction method (Umap, tsne or pca) and the argument how the cells should be grouped color.cells.by. Here we use Umap for dimensionality reduction and cluster the data based on their cell types inferred from the projecTILs output.

trajectories_output <- GEX_trajectories(GEX = VGM[[2]],
                                        color.cells.by = 'ProjecTILS_assignment',
                                        reduction.method = 'UMAP',
                                        labels.per.group = 2,
                                        group.label.size = 3)

trajectories_output[[2]]
trajectories_output[[3]]

trajectories_output[[3]] shows the UMAP reduced cell clusters and the trajectories connecting different cell clusters. This plot will be used to plot pseudotime along the trajectories. For that, pseudotime values need to be assigned to every single cell by choosing a single or multiple root nodes. For this you can define root.nodes. In our case we choose a node which is located among T_cmp cells (e.g. c(‘Y_93’, ‘Y_157’)).

pseudotime_output <- GEX_pseudotime_trajectory_plot(
  trajectories_output[[1]],
  root.nodes = c('Y_93', 'Y_157'))

pseudotime_output[[2]]

#update VGM[[2]] with the inferred pseudotime from pseudotime_output
VGM[[2]] <- SeuratObject::AddMetaData(VGM[[2]], SummarizedExperiment::colData(pseudotime_output[[1]])$pseudotime,
                               col.name = "pseudotime")

## [1] "calculating pseudotime trajectories for each cell depending on the chosen root nodes"

4.1 Visualizing gene expressions along the trajectory plot

If you’re interested in a gene or a set of genes and in their expression levels along trajectories you can use the same function from above with the parameter genes. This will visualize the UMAP and the trajectory with the specified genes and their expression levels.

interesting_genes <- c('CD3E', 'CD4', 'CD8A', 'CD44')
trajectories_output <- GEX_trajectories(VGM[[2]],
                                        color.cells.by = 'ProjecTILS_assignment',
                                        reduction.method = 'UMAP',
                                        genes = interesting_genes, 
                                        group.label.size = 2)

trajectories_output[[2]]
trajectories_output[[3]]

Here again, root.nodes can be specified for the pseudotime plot as above.

5.Clonotypes along pseudotime

The VDJ_GEX_clonotyme() function also uses the Monocle3 workflow to infer trajectories and pseudotime. It projects gene expression of specific genes ( highlight.genes) along pseudotime as well as the expression and distribution of clonotypes (top.N.clonotypes) along pseudotime. Additionally it plots different cell characteristics along pseudotime (ridgeline.separator can be any meta data input such as ‘group_id’, ‘seurat_clusters’ etc.). Here we choose the default clonotypes which displays the top.N.clonotypes = 5 along pseudotime.

clonotyme_output <- VDJ_GEX_clonotyme(vdj.gex.matrix.output = VGM, version="v3",
                                      highlight.genes= c('CD3E','CD4',
                                                         'CD8A','CD44'),
                                      root.nodes = c('Y_93', 'Y_157'),
                                      colors = color_palette,
                                      top.N.clonotypes = 5,
                                      ridgeline.separator = 'clonotype')

clonotyme_output[[2]]
clonotyme_output[[3]]
clonotyme_output[[4]]

clonotyme_output[[2]] shows the distribution of the clustered cell types along pseudotime. clonotyme_output[[3]] shows the expression of the highlight.genes along pseudotime and cells colored based on the seurat_clusters identity. clonotyme_output[[4]] shows the expression and distribution of clonotypes for the specified genes along pseudotime.

5.1 Functional cell types along pseudotime

Here the ridgeline.separator was set to the cell types annoted by PRojecTILs.

clonotyme_output <- VDJ_GEX_clonotyme(vdj.gex.matrix.output = VGM, version="v3",
                                      highlight.genes= c('CD3E','CD4',
                                                         'CD8A','CD44'),
                                      root.nodes = c('Y_93', 'Y_157'),
                                      colors = color_palette,
                                      top.N.clonotypes = 3,
                                      ridgeline.separator = 'ProjecTILS_assignment')

clonotyme_output[[2]]

5.2 Clonal expansion along pseudotime

The VGM_expanded_clones() function can be used to infer whether a cell belongs to an expanded clonotype or not. It can be of interest to see how expanded and not-expanded cells behave along pseudotime. For that, we use ridgeline.separator = "expanded_clonotype_frequency".

VGM <- VGM_expanded_clones(VGM = VGM,
                           add.to.VDJ = TRUE,
                           add.to.GEX = TRUE,
                           expansion.threshold = 1)

clonotyme_output_expanded <- VDJ_GEX_clonotyme(vdj.gex.matrix.output = VGM, version="v3",
                                      highlight.genes = c('CD3E', 'CD4', 'CD8A', 'CD44'),
                                      root.nodes = c('Y_93', 'Y_157'),
                                      top.N.clonotypes = 3,
                                      ridgeline.separator = "expanded_clonotype_frequency")

clonotyme_output_expanded[[2]]

5.3 Module scores along pseudotime

The VDJ_GEX_clonotyme function can also be used in combination with module scores for specific genes. For that we create a list of vectors of interesting genes (gene.list).

First, let’s get the module scores for each gene vector ( In our case there are two). Seurat::AddModuleScore() adds the score for each module to the VGM. In a next steps these module scores can be visualized over pseudotime by using the parameter genes.for.module.score.

Important: please keep the naming convention (name = paste0(names(gene.set), "_module")) in AddModuleScore() as shown. Otherwise VDJ_GEX_clonotyme() will not recognize the genes.for.module.score input!

gene.set <- list(c('CD3E', 'CD4'),c('CD8A', 'CD44')) # list of vectors of genes
names(gene.set) <- c("a_genes", "b_genes") #name gene sets  

VGM[[2]] <- Seurat::AddModuleScore(object = VGM[[2]],features = gene.set, name = paste0(names(gene.set), "_module"))

clonotyme_output_modules <- VDJ_GEX_clonotyme(vdj.gex.matrix.output = VGM,
                                              version="v3",
                                              colors = color_palette,
                                              root.nodes = c('Y_93', 'Y_157'),
                                              genes.for.module.score = gene.set)

clonotyme_output_modules[[4]]

## [[1]]
## 
## [[2]]

The output of VDJ_GEX_clonotyme stays the same as above expect for Element [[4]] which shows the module scores along pseudotime.

6. Trajectories and pseudotime plot using slingshot

There are other trajectory inference methods than Monocle3 and the following function uses the slingshot library and shows a different approach to trajectory inference. It clusters again the cells and instead of inferring pseudotime trajectories, it displays multiple lineage trajectories based on the clustered cells. The root cluster has to be defined beforehand, here we take cluster 5 since that’s where the most T_cmp cells are located as our starting root.

Seurat::DimPlot(VGM[[2]], reduction = "umap",
        group.by = "seurat_clusters", pt.size = 0.5, label = TRUE)

lineage_trajectory_output <- GEX_lineage_trajectories(VGM[[2]], cluster.num = 5)